Goto

Collaborating Authors

 regularized empirical risk minimization


Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

Neural Information Processing Systems

We develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches has become a golden standard in the machine learning community, because the mini-batch techniques stabilize the gradient estimate and can easily make good use of parallel computing. The core of our proposed method is the incorporation of our new ``double acceleration'' technique and variance reduction technique. We theoretically analyze our proposed method and show that our method much improves the mini-batch efficiencies of previous accelerated stochastic methods, and essentially only needs size $\sqrt{n}$ mini-batches for achieving the optimal iteration complexities for both non-strongly and strongly convex objectives, where $n$ is the training set size. Further, we show that even in non-mini-batch settings, our method achieves the best known convergence rate for non-strongly convex and strongly convex objectives.


An Accelerated Proximal Coordinate Gradient Method

Qihang Lin, Zhaosong Lu, Lin Xiao

Neural Information Processing Systems

We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that avoid full-dimensional vector operations. For ill-conditioned ERM problems, our method obtains improved convergence rates than the state-ofthe-art stochastic dual coordinate ascent (SDCA) method.


Reviews: Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

Neural Information Processing Systems

The paper proposes a novel doubly accelerated variance reduced dual averaging method for solving the convex regularized empirical risk minimization problem in mini batch settings. The method essentially can be interpreted as replacing the proximal gradient update of APG method with the inner SVRG loop and then introducing momentum updates in inner SVRG loops. Finally to allow lazy updated, primal SVRG is replaced with variance reduce dual averaging. The main difference from AccProxSVRG is the introduction of momentum term at the outer iteration level also. The method requires only O(sqrt{n}) sized mini batches to achieve optimal iteration complexities for both convex and non-convex functions when the problem is badly conditioned or require high accuracy.


An Accelerated Proximal Coordinate Gradient Method

Neural Information Processing Systems

We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that avoid full-dimensional vector operations. For ill-conditioned ERM problems, our method obtains improved convergence rates than the state-ofthe-art stochastic dual coordinate ascent (SDCA) method.


Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

Murata, Tomoya, Suzuki, Taiji

Neural Information Processing Systems

We develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches has become a golden standard in the machine learning community, because the mini-batch techniques stabilize the gradient estimate and can easily make good use of parallel computing. The core of our proposed method is the incorporation of our new double acceleration'' technique and variance reduction technique. We theoretically analyze our proposed method and show that our method much improves the mini-batch efficiencies of previous accelerated stochastic methods, and essentially only needs size $\sqrt{n}$ mini-batches for achieving the optimal iteration complexities for both non-strongly and strongly convex objectives, where $n$ is the training set size. Further, we show that even in non-mini-batch settings, our method achieves the best known convergence rate for non-strongly convex and strongly convex objectives.


Best-scored Random Forest Classification

Hang, Hanyuan, Liu, Xiaoyu, Steinwart, Ingo

arXiv.org Machine Learning

We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspective, within the framework of regularized empirical risk minimization penalized on the number of splits, we establish almost optimal convergence rates for the proposed best-scored random trees under certain conditions which can be extended to the best-scored random forest. In addition, we present a counterexample to illustrate that in order to ensure the consistency of the forest, every dimension must have the chance to be split. In the numerical experiments, for the sake of efficiency, we employ an adaptive random splitting criterion. Comparative experiments with other state-of-art classification methods demonstrate the accuracy of our best-scored random forest.


Classifying Big Data over Networks via the Logistic Network Lasso

Ambos, Henrik, Tran, Nguyen, Jung, Alexander

arXiv.org Machine Learning

ABSTRACT We apply network Lasso to solve binary classification (clustering) problems on network structured data. To this end, we generalize ordinary logistic regression to non-Euclidean data defined over a complex network structure. A scalable classification algorithm is obtained by applying the alternating direction methods of multipliers to solve this optimization problem. Index Terms-- compressed sensing, big data over networks, semi-supervised learning, classification, clustering, complex networks, convex optimization I. INTRODUCTION We consider the problem of classifying or clustering a large set of data points which conform to an underlying network structure. Such network-structured datasets arise in a wide range of application domains, e.g., image-and video processing as well as social networks [1].


An Accelerated Proximal Coordinate Gradient Method

Lin, Qihang, Lu, Zhaosong, Xiao, Lin

Neural Information Processing Systems

We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that can avoid full-dimensional vector operations. For ill-conditioned ERM problems, our method obtains improved convergence rates than the state-of-the-art stochastic dual coordinate ascent (SDCA) method.